-
-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use bufio.Writer instead of binary.Write to reduce memory footprint while serializing #103
use bufio.Writer instead of binary.Write to reduce memory footprint while serializing #103
Conversation
Your claim, as I understand it, is that binary.Write and binary.Read allocates buffers and is inefficient. Nothing in your analysis supports this. You are not providing any evidence that the new code will be more performant. You claim "current implementation bufio.Reader is more memory efficient than binary.Read for large set" but you do not provide evidence or even a rationale for why that might be. You need to provide strong and detailed evidence. Our code is default textbook code. Before we adopt something more complicated, we need to have a deep understanding and much evidence. |
In the implementation of binary.Write, its clear that binary.Write will allocates another []byte with same size with the input. I ran a benchmark to see how much binary.Write allocated compare to buffio.Write
As you can see, the binary version is faster but use 2x memory. Note that this is only true for large data.
|
Thanks. @josharian : can you comment? |
I'm fighting some fires on other fronts, so this is based on only a cursory look... Yes, binary.Write allocates the entire buffer up front. This is arguably a bug: It could do internal batching for large slices. I'm not sure how much excitement there'd be upstream for fixing it. The proposed replacement seems fine, in broad strokes. bufio.Writer doesn't really add all that much, since the relevant sizes can be calculated in advance; you could manage a single buffer yourself just as easily. Or if the only goal is to avoid gigantic allocs, you could call binary.Write chunk by chunk. But this also seems reasonable enough. Let me know if you'd like me to take a more careful look at the code. |
bitset.go
Outdated
// binary.Write for large set | ||
writer := bufio.NewWriter(stream) | ||
var item = make([]byte, 8) // for serializing uint64 | ||
var n = binary.Size(uint64(0)) // number of bytes written |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems odd, doesn't it? You assume that the size is 8, and then you next compute it? Do I get this right?
bitset.go
Outdated
if err != nil { | ||
return int64(n+nn), err | ||
} | ||
n += nn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you do all of these additions?
bitset.go
Outdated
reader := bufio.NewReader(stream) | ||
i := 0 | ||
var item = make([]byte, 8) // one uint64 | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not go with the standard...
for i:=0 ....
@thanhpk Let us push this forward. Please have a look at my comments.
It is disappointing code. |
I suspect that the upstream response is: "binary.Write is not meant for giant data structures; for very large things you probably want enough control that you should write it yourself". But don't take my word for it: File an issue. I don't recall having seen anything specifically about this; the closest I recall are golang/go#27757 and golang/go#27403. |
@lemire I fixed the issues you commented. |
Merged. |
While writing (or reading) a very big bitset (few GBs) into a file, the memory usage will be doubled, crashing the program.
This is because you are converting the whole set into another big
[]byte
slice. (throughbinary.Write
)This pull request fixs the issue by not trying to convert the whole set (
[]uint64
) into[]byte
, but convert one by one item and push it to a buffered stream.The memory usage is the same after calling
WriteTo
orReadFrom
Refs:
bufio
is the recomented way to read and writing buffered data. (https://pkg.go.dev/bufio#pkg-overview)A loop while reading out from
bufio.Reader
is also quite common (https://pkg.go.dev/bufio#example-Scanner.Bytes)